Chomsky–Schützenberger theorem

In formal language theory, the Chomsky–Schützenberger theorem is a statement about the number of words of a given length generated by an unambiguous context-free grammar. The theorem provides an unexpected link between the theory of formal languages and abstract algebra. It is named after Noam Chomsky and Marcel-Paul Schützenberger.

Statement of the theorem

In order to state the theorem, we need a few notions from algebra and formal language theory.

A power series over \mathbb{N} is an infinite series of the form

f = f(x) = \sum_{k=0}^\infty a_k x^k = a_0 %2B a_1 x^1 %2B a_2 x^2 %2B a_3 x^3 %2B \cdots

with coefficients a_k in \mathbb{N}. The multiplication of two formal power series f and g is defined in the expected way as the convolution of the sequences a_n and b_n:

f(x)\cdot g(x) = \sum_{k=0}^\infty \left(\sum_{i=0}^k a_i b_{k-i}\right) x^k.

In particular, we write f^2 = f(x)\cdot f(x), f^3 = f(x)\cdot f(x)\cdot f(x), and so on. In analogy to algebraic numbers, a power series f(x) is called algebraic over \mathbb{Q}(x), if there exists a finite set of polynomials p_0(x), p_1(x), p_2(x), \ldots , p_n(x) each with rational coefficients such that

p_0(x) %2B p_1(x) \cdot f %2B p_2(x)\cdot f^2 %2B \cdots %2B p_n(x)\cdot f^n = 0.

Having established the necessary notions, the theorem is stated as follows.

Chomsky–Schützenberger theorem. If L is a context-free language admitting an unambiguous context-free grammar, and a_k�:= | L \ \cap \Sigma^k | is the number of words of length k in L, then \sum_{k = 0}^\infty a_k x^k is a power series over \mathbb{N} that is algebraic over \mathbb{Q}(x).

Proofs of this theorem are given by Kuich & Salomaa (1985), and by Panholzer (2005).

References